- Views: 6
- Report Article
- Articles
- Computers
- Information Technology
DS-200: Data Science Essentials Beta Exam
Posted: Jul 18, 2014
DS-200: Data Science Essentials Beta Exam is compiled to provide certification to the successful candidate; the exam taker should have proper knowledge and skills on the exam topics that are given in this article along with the resources.
DS-200: Data Science Essentials Beta Exam topics consist of Data Acquisition, Data Evaluation, Data Transformation, Machine Learning Basics, Clustering, Classification, Collaborative Filtering, Model/Feature Selection, Probability, Visualization and Optimization.
The candidates that are looking more than just main exam topics for the preparation of DS-200: Data Science Essentials Beta Exam can consider the paragraphs below in which we have listed the topics of the exam with details along with their considerable study resources as given by the vendor.
Data Acquisition consists of Access and load data from a variety of sources into a Hadoop cluster, including from databases and systems such as OLTP and OLAP as well as log files and documents, Deploy a variety of acquisition techniques for acquiring data, including database integration, working with API,Use command line tools such wget and curl. The candidates can prepare by the help of Hadoop tools such as Sqoop and Flume, Apache Sqoop,, Aaron Kimball on Sqoop, Apache FlumeCloudera's blogs on Apache Flume, Cloudera's blogs on data collection, HDFS File System.
DS-200: Data Science Essentials Beta Exam also consists of Data Evaluation which includes Knowledge of the file types commonly used for input and output and the advantages and disadvantages of each, Methods for line and at scale, sampling and filtering techniques, A familiarity with Hadoop SequenceFiles and serialization using Avro the preparation of which can be done by Hadoop: The Definitive Guide, 3rd Edition, Hadoop In Practice, Apache Avro and Cloudera's blogs on Apache Avro.
Data Transformation covers a map-only Hadoop Streaming job, script that receives records on stdin and write them to stdout, Invoke Unix tools to convert file formats, Join data sets, scripts to anonymize data set, a Mapper using Python and invoke via Hadoop streaming, a custom subclass of FileOutputFormat, records into a new format such AvroOutputFormat or SequenceFileOutputFormat preparation of which can be done by Hadoop Streaming, Hadoop Streaming wiki, Apache Hive, Hive tutorial, Hive language manual, Hive joins documentation, Apache Pig, Pig's relational operators, Cloudera blog on Python frameworks for Hadoop and Hadoop: The Definitive Guide, 3rd Edition.
DS-200: Data Science Essentials Beta Exam next topic is called Machine Learning Basics in which the candidates learn about Mappers and Reducers to create predictive models, different kinds of machine learning, including supervised and unsupervised learning, uses of parametric/non-parametric algorithms, support vector machines, kernels, neural networks, clustering, dimensionality reduction, and recommender systems. Clustering consist of clustering and identify appropriate use cases, similarity metrics including Pearson correlation, Euclidean distance, and block distance and the algorithms applicable to each model (k-means, SVD/PCA, etc.).
Classification consists of the following objectives a set of data in order to identify new data based on known data, cases for logistic regression, Bayes theorem and classification techniques and formulas, these objectives can be prepared by Programming Collective Intelligence, Algorithms of the Intelligent Web and Mahout In Action.
CertifyGuide provides full Preparation Material for its candidates to pass exam in first try using our Ds-200 Latest Exam and Cca-410 Study Guides.
CertifyGuide provides full Preparation Material for its candidates to pass exam in first try using our 1Z0-481 Latest Exam and 1Z0-482 Study Guides.