Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RightNow Technologies Candidate Questionaire 1. Imagine you had complete control over your surroundings and could create the ultimate employment situation. a. Describe your ideal job. Working, as part of a R&D group, on core technology issues related to natural language processing and text mining, developing research prototypes and working with product groups to transfer the successful approaches to products. b. Describe the specific technologies you would be using. Object-oriented programming, such as C++ or Java, for developing the main research prototypes, and Perl for text processing and higher-level scripts. c. What percentage of time would you spend on heads-down technical work vs. other? The ideal percentage would be 70% for heads-down, individual technical work and 30% for the other, where I interpret “other” as meaning meeting with fellow R&D and product group members. d. Describe the work environment. I consider myself to be a team player and would thrive in an environment where communication and collaboration are actively pursued. A relaxed environment, with few layers of management where initiative is encouraged. 2. Describe a past technical achievement of which you are especially proud. Three years ago, as I was beginning to formulate my research, I came across a fundamental problem in clustering; how to combine different clustering systems. I though it was an under-explored but very important problem in data mining. My advisor could not help me much in this domain since it was not her main expertise. I set out to explore it myself, from conception of algorithms to implementation. My work was accepted on a highly selective conference (PKDD 2004, 581 submissions, 18% acceptance rate) and was honored with the Best Student Paper Award. 3. On a scale of 1 – 5, (1=no knowledge, 5=expert, demonstrated by significant experience) please rate your knowledge of the following technologies: Technology C C++ UNIX software development Microsoft Windows software development .NET/C# Rating 5 4 5 2 1 Years Experience 8 5 8 2 0 SQL 1 0 Java 1 0 Web application (not web page) development 1 0 HTML/DHTML/Javascript 1 0 Comments? I have been using Perl as a scripting language for the past 8 years for text processing tasks and to write scripts that control the execution of research software. I have also been using Matlab for the past 8 years. I also have experience in developing code to process large amounts of data. This requires developing code to be run in parallel on multiple machines using pmake and rexport. 4. In your most recent work experience, describe your role – manager, project lead, member of a small team, member of a large team or individual contributor. Which role do you prefer? I was always a member of a small team or working individually. I would prefer to work as a member of a small team. 5. Why should RightNow Technologies hire you over other candidates? Because I can bring the interdisciplinary knowledge that this R&D position requires. Working on R&D means having the foundations to offer solutions in a range of different applications. I have a publication record on natural language processing, data mining and speech recognition, all based on statistical learning algorithms. And because I love pursuing and implementing research ideas and I enjoy working very hard to bring them to fruition. 6. Are you legally eligible for employment in the United States? Yes I am. I currently have a F-1 student visa, which can be extended to Optional Practical Training (OPT). Applied Research Candidate Questionaire 1. What is the largest software project you have worked on, both in number of team members as well as (approximate) number of lines of code? The SRI Decipher system, a large vocabulary speech recognition system, of tens of thousands lines of code. And the Microsoft Research Whisper system, a similar large vocabulary speech recognition systems, again of tens of thousands lines of codes. In terms of team members, I have been working with about 1-2 people. 2. Describe your experiences with handling data going into and out of a database at the code level. I have been using standard tools for this such as cvs and rcs for Unix. 3. Describe your debugging skills (tools used, processes, etc): I have been using the gdb debugger as well as some graphical environments recently available (Eclipse). I have also used Microsoft Visual C++ under Windows. Processes involved are informal, such as emailing or talking to people. 4. Describe the most difficult bug you solved, and what made the debugging process particularly hard. Some years ago, while developing software on HTK, a software toolkit for research prototype development in speech recognition, I was able to find that the order the speech models are stored in the file can actually change the results of the experiment. This was a very hard bug to find because it was the last thing that someone would expect. 5. On a scale of 1 – 4, (1=no knowledge, 2=research or coursework, 3=prototype, 4=shipped production code) please rate your knowledge of the following technologies: Published Technology Rating (yes/no) Information Retrieval 2 no Natural Language Processing 3 yes HTTP/spidering 1 no Text Clustering 3 yes Text Classification 3 yes Text Summarization 2 no Swarm Intelligence/Ant Colony 1 no Optimization Collaborative Filtering 2 no Ontology/Topic extraction 1 no Data Mining 3 yes Machine Learning 3 yes Please expand on individual technologies where appropriate: Years Experience 1 5 0 5 5 1 0 1 0 5 5 6. What is the most interesting emerging trend in any of the above areas? One of the most fascinating trends is the move from supervised to semi-supervised and unsupervised approaches. Collecting, annotating and cleaning training data are by far the most expensive steps in the process of developing new applications in NLP. Ways to reduce the cost of these steps are crucial. In addition, I think that another fascinating trend is dealing with huge amounts of data. Issues of scalability and speed emerge.