Real time project on ETL-Informatica developer in an organization
Posted: Nov 16, 2018
Iteration 2: Analysis
At this stage, I would have known the culture of the organization and their work environment and would shift my focus to understanding and analyze the different ETL tool components that are used for managing the data within the organization. Through understanding the tools, I would have the capability of knowing how to use them when performing the different tasks as ETL developer. By the end of the second iteration, the business requirement documents are thoroughly reviewed and analyzed for further modifications. The main idea is to be perfect with all the activities that are connected to the previous iteration (K. Ashok, Personal Communication, January 18, 2016).
In my plan to prepare for the iteration on analysis of the ETL tool components, I had the mandate of getting all the information regarding the selection of the best ETL tooling that would be useful in the execution of my duties in Logic Planet Consulting Services, Inc as an Informatica developer. I planned to meet with the company’s informatics developers that had the knowledge of the ETL tools in a meeting that would take place in one of the halls in the company, and there would be follow-up meetings up to February 01, 2016 when the iteration is expected to end. The developers have the experience in the execution of ETL process, data loading, extraction, workflow management, integration, workflow management and other relevant experience on ETL processes (K. Ashok, Personal Communication, January 18, 2016). I planned to get real-time insight regarding the definition of the characteristics of the ETL tools and the way they have usage in the organization.
In this iteration, the topics that I expected to have coverage included the architecture, the architecture, the router transformations, Informatica power center and qualifier transformations. Among the things that I wanted to be conversant with included the various business components that place the key role in the development of the data warehouse. I planned to make use of this session and make the necessary inquiries of how I would be designing the ETL workflow monitoring that would be taking place in the next iteration. I also planned to take advantage of this session and analyze the requirements for the project at hand. I planned to make the necessary inquiries from the company’s system analyst and the project manager who would help me to have in-depth knowledge concerning the process of reviewing and analyzing the project requirements documentation. That would aid me to get to understand how to develop the project objectives before embarking on the task that has been assigned to me (K. Martial, Personal Communication, January 18, 2016).
On January 18, 2016, I met the company’s ETL developers in the Orange Hall that was being used in conducting such meetings. The aim of the meeting with them was to have a thorough knowledge of the ETL tool components that are having usage in the company in the execution of the ETL processes and tasks. Before the meeting would commence there was the distribution of the program and how it would be conducted including the personnel that would be handling each task of training on the ETL components. The tools that were on the list included the Informatica power centers, the power connect, Power Mart, power exchange, power analysis and the Informatica power quality (M. Nancy, Personal Communication, January 22, 2016). On Monday, January 18, 2016, the training on the tools commenced, and it was the task of the on for the senior developers, Mrs. Nancy who trained on the Informatica power center and Informatica Power Mart. She taught on how to use the tools in processing data, how to create repositories and conversion of the local repository to the global repository.
Mr. Charles taught about the mapping of components on the third day of the first week. The mapping defines the data extraction, transformation and it loading. He also trained on the components of mapping including source definition, target definition, and the transformation logic (ADBIS et al., 2015). The ETL mapping looks as in the following diagram.
Mrs. Hannah also talked about the process of following components: then center power designer, the workflow manager, the workflow monitor and the repository manager before Mr. Rajesh trained in the process of reviewing and analyzing the requirements documentation. I learned about how to use the Informatica power center designer to design the ETL process that is also known as mapping. I also learned how to leverage the power center workflow manager to create the center power objectives including the sessions for each mapping and the workflow start sessions (A. Hannah, Personal Communication, January 25, 2016).
The intern observed that there had been the development of many ETL tools to simplify the work of the data warehouse developer. I observed that initially the developers had the task of handwriting the SQL code, but the emergence of the data warehouse development tools has helped to developers to only drag and drop items while developing the data warehouse (Wang, 2007). I also observe that the ETL tools that are available in the market have expanded the functionality beyond the data warehouse development and ETL. The ETL tools being used nowadays have extended functionality or data cleansing, profiling, big data processing, enterprise application integration, master data management and data governance (J. Charles, Personal Communication, January 20, 2016). I also observed that the primary function of the ETL and data integration software entails the performance of extraction, loading, and transformation of data. The ETL tools are not enough, and so the enterprises need to have the business intelligence tools for the purpose of analyzing and visualizing the data after the data has been availed to the OLAP cube or the data warehouse.
I observed that the ETL tool is very useful in the process of creating the ETL processing that makes the management of data an easy task. One had to be conversant of the ETL tool’s components if they have to execute their tasks effectively. I observed that there are also so many tools that the Informatica developer needs to learn in light of the data transformation, data loading, and data extraction, and that is the way to apply those tools. I observed that the trainers in this iteration had a thorough knowledge and experience in the way to carry out all the functions of the ETL tool and the way they structured the analysis session contributed to making it successful. I also observed that the analysis and review of the requirements documentations help in the understanding of the project at hand and the issues that require resolution (K. Ashok, Personal Communication, January 28, 2016).
The knowledge of the ETL tools and the ETL tool components being used in the execution of the required tasks is very paramount to the success of the Informatica developer. The Informatica developer has the mandate of extracting data from various sources such as the databases, systems, and applications after which he/she also transform it and then load it into another database, data mart or data warehouse before analyzing it (Albrecht & Naumann, 2008). The process cannot take place manually but through the ETL tool. The iteration helped to be well informed regarding these tasks through the use of the ETL tool. The trainers helped me have a real-time knowledge and experience regarding the execution of the ETL process using the various ETL tool components. It helped me to meet all my objectives for this session except for a few. The company leveraged only the trainers that were well-versed with the knowledge on the ETL tool and processes, and that are the reason the iteration had accomplishment with less difficulty (J. Charles, Personal Communication, January 29, 2016).
The iteration inculcated in me the knowledge concerning the ETL processes, but since there were no real-time examples to work on, I could not acquire the practical experience that I needed to have from this session. I suggest that the company should have in place the real-time example of tasks on which the intern should work on so as to enhance his/her understanding when they are being trained in the execution of their tasks. The other things that did not do well in this session are that the trainers did not handle the other ETL tools because they thought that as an ETL Informatica developer I only needed to have knowledge of the ETL tool. I suggest that the company have the inclusion of the other ETL tools in the training so that the intern can have full knowledge and experience of the other tools as well.
Iteration 3: Design
In the third iteration, I would start the process of creating the ETL application workflows and reduce errors day by day while working. I would design the different mappings and sessions that I should be performing in the organization (K. Ashok, Personal Communication, February 02, 2016). I would be ensuring that I have the capability of completing all the tasks that I am assigned to by the team leader within the required time frame. I would prepare myself for tackling any tasks that are in my line of duty with the help of the supervisor that would be assigned to me throughout this iteration.
My plan for the design of the ETL workflows entailed meeting with the company’s ETL Informatica developers and the project manager in a brainstorming session who would help me with the knowledge on how to conduct the task at hand. The brainstorming meeting would take place in the brainstorming room, and it would run for two days beginning on February 02, 2016 to February 03, 2016. The resource person has a through training, experience and skills in the creation of ETL processes and their experience would be an advantage to me as a young ETL Informatica developer. I planned to use the session and prepare adequately for the design task at hand so as to accomplish it with less difficult. I planned to have a thorough knowledge of the source-ETL design considerations including populating the calendar data via the calendar population scripts, and populating the tables in the right order.
The brainstorming session would help me understand how to go in accomplishing my tasks such as the configuration of the data population building blocks for the purpose of delivering the data acquisition services effectively. I planned to understand and execute the communication model using the architecture that best meets the company’s business needs. I also planned to understand and carry out the creation of the source to the destination documents for the source ETL. I planned to design effectively the plan that would be helpful in rectifying the source ETL quality problems regarding data. I also planned to design the source ETL jobs control as well as the ETL workflow using the tips provided in the brainstorming session. I also planned to handle the task of designing the ETL exceptional handling in an effective way as thought in the brainstorming session. The last thing that I wanted to do in this session was the writing of the source-ETL that can load efficiently.
I had a brainstorming session with the company’s project manager as well as the ETL developers in the company’s brainstorming hall, and there was a great share of ideas on how to structure the design phase and the activities I was to carry. The experts also assigned me one of the junior developers to help me with any difficult task that I may encounter during the execution of my duties so that I would accomplish the tasks required with much ease. I learned many things from the brainstorming session, and it gave birth to the plan of the design phase that I was to execute. I learned and implemented the population of the calendar data using the population scripts that are in provision with the data communication model (J. Seth, Personal Communication, February 05, 2016). I populated the tables beginning with lookup tables, reference tables, and lastly the base tables. The ETL first extracts the data from various sources, assures the quality of that data, cleanses it, and then makes it consistent across all the original sources.
After the process of cleansing the data, I had to populate the physical objects with the cleansed data so that the report writers, the dashboards, the query tools and so on can get access to the data. I performed the sourcing, the movement, the transformation and the data loading activities which are the fundamental services for constructing data acquisition. Before I could begin building the extract systems, I had to come up with a logical data interface document for mapping the correlation between the original fields and the target fields in the tables (K. Martial, Personal Communication, February 10, 2016). I designed a plan for dealing with the data quality issues to make sure there are data validity, data accuracy, data consistency, data latency, and data reasoning. I then designed the source ETL jobs control using the tips gained from the brainstorming session such as the use of a common structure for all jobs, maximizing parallelism execution of jobs, using a one-to-one mapping technique from the source to destination among other tips.
The observation that I made from the design iteration are twofold; that is, the observations regarding the brainstorming session and the techniques of designing and executing the ETL processes. I observed that brainstorming was used experts as a method for generating ideas and solving problems (A. Hannah, Personal Communication, February 01, 2016).. I observed that many brainstorming sessions were flawed due to the lack of clear goals and plan for the same. I observed that the process of design the ETL processes was tedious, and I had to work for more time including the extension in the office to make sure that I completed each day’s tasks and plan for the subsequent tasks. The use of ETL in the Oracle communication data model played a big role in the warehouse environment. I observed that there were three layers in the Oracle Communication Data Model that the ETL tool populates including the staging, the access, and the foundation layer. The diagram below illustrates the flow diagram of ETL process.
I observed that there were two types ETL used for populating the communication model including the source ETL and the intra-ETL (Ankorion, 2005). The source ETL is useful in populating the staging layer as well as the foundation layer of the data communication model with data from the creating system. On the other hand, the intra-ETL is useful in populating the access layer with the data from the foundation layer. The access layer includes the aggregate tables, the OLAP cubes, the derived tables, the data mining models and the materialized views (K. Martial, Personal Communication, February 13, 2016). I observed that the design of the ETL process entailed the design of the source ETL as well as the intra-ETL. The design entailed the creation of a source of the target mapping document, the design of a plan that would have usage in the rectification of data quality problems, designing for exceptional handling, the design of ETL jobs control and control among others.
The iteration on designing was very helpful to me as a young ETL Informatica developer as I had to learn many things that I did not know before. I came to understand about the types of ETL including the source ETL and the intra-ETL including the design of both of them. The brainstorming session gave me a plethora of knowledge and experience on how to carry out the design of these two types of ETL processes. I learned how to design for the ETL workflow as well as the jobs control. It also helped me understand the process of making a design that can be helpful in handling exceptions and resolving the data quality problems. The tasks gave me a real-time experience on the design of the ETL processes because I had to work on the actual systems as shown by the developers in the previous meetings.
I made sure that I load data efficiently while developing the mapping scripts and loading the data into the various layer of the data communication model. That was because data had to be gotten into the warehouse in an expedient manner as much as possible. However, despite most of the things going on seamlessly as anticipated, I was assigned less time to work on the project of designing for ETL and that made me work for extra hours so to complete the tasks for each day. The supervisor assigned to me was also not available most of the time. My suggestion is that the company should dedicate enough time to every activity bearing in mind that the intern is not well versed in most of the technical activities. The assigned supervisors should also be available all the time for consultation.
Iteration 4: Implementation
Implementation would be the second last iteration. At this stage, the code is implemented. I would review the code and remove errors (K. Ashok, Personal Communication, March 02, 2016). At this stage, I would also study the ETL workflow. I would first put the focus on the extraction, and then get to understand data transformation and how the data are to be loaded. Since I am an intern, I would not be allowed to perform a complex task; hence, most of the time I would learn through observation.
The plan that then insert had regarding this iteration on implementation included the resolution of any issues that were might have come during the previous execution of designing the ETL processes and loading the data from the original sources to the destinations. I planned to review the data again in every single step to make sure that the data in the original sources as well as the target database is clean minus any errors. I planned to review the ETL workflow with the help of other ETL Informatica developers to make sure that the known business rules had the right application to make the data consistent. The introduction of errors into the data makes the data lack the required quality, so the plan I had was to make sure that I would thoroughly examine the data to eliminate the data quality issues if any. There are tasks that I would not have an active involvement and so in those areas, I planned to observe keenly and make notes as appropriate.
I also planned to study the ETL workflow, whereby I would first put the focus on the extraction before getting to understand data transformation and how the data are to be loaded. The review of the workflow would entail reviewing the sessions and the way the monitoring of the workflow is taking place and that would provide me with more knowledge and experience as a young ETL Informatica developer. I planned to view the session as well as the workflow and session states; I would review the fetching of session logs from the repository, and the management of the center power repository. I planned to review if relevant data is having extraction, how the transformation of data is taking place including the building of keys, the cleansing of data and the correctness of the format. In the loading of data, I would examine how the loading onto the data warehouse takes place and the building of aggregates.
On an actual day, that is March 02, 2016 when the implementation was to take place; I had a meeting with the company ETL Informatica developer and the system administrator in the system administrator’s office where also the servers were housed. The aim of this meeting was to remove any errors introduced in the data while I was carrying out my tasks in the previous iteration and also review the workflow process for the purpose of optimizing them. We began by examining the quality of the data by addressing the data quality issues to make sure that the data was valid, consistent among all the original data sources, complete, and accurate. The examination of the data validity involved examining the content and sufficiency of the data to ensure that it met the expectations (K. Martial, Personal Communication, March 03, 2016). The examination of data accuracy involved reviewing the correctness of the addresses and the standards that had a definition by the company.
Checking errors that affected data completeness entailed reviewing if all the required data was there and updating the database in case there was a lack of all the data. In the examination of errors that pertained to data consistency, we applied the various rules for eliminating inconsistency in the data to make sure that the data was consistent with all the sources. We also examined the data reasoning whereby we examined had to determine if the data made sense from the business perspective and if it was possible to combine the data the way and the end users would expect. I performed most of the tasks regarding the checking of errors in the data as I consulted with the resource persons with whom I was working. The next task included the review of the workflow so as to optimize the process as desired. The ETL flow that we examined looks as in the diagram below whereby most of the work was done by the developer.
From the implementation phase, I made several observations in light of checking the errors and reviewing the ETL workflow. I observed that data is very imperative to any organizational, and it should be carefully handled so as to make sure that it remains consistent and without any errors. Any errors in the data or ETL process can make some of the operations on the same not be possible. The examination of the errors in the various sources involved examining the data quality issues to make sure that the data was accurate, valid, had latency, was complete and that it was relevant to the business (J. Seth, Personal Communication, March 10, 2016). I also observed that if the data had to be free of any errors, it had to be extracted, transformed and loaded in the right manner; and it also had to conform to the laid down rules and standards. I observed that the data management was the task carried out by as few people as possible because the usage of many people would introduce many errors and inconsistencies that may prove to be expensive while eliminating them.
I observed that the system administrator and then Senior ETL Informatica developer underwent training from time to time so as to make sure that they gained the required knowledge and skills to execute their tasks effectively as they were obliged with the company’s data stewardship. I observed that any mistake in the data management was a punishable offense and for that reason the data stewards did not allow me to carry out critical tasks on the data lest I introduce other issues. For that purpose, I had to watch the way they performed most of the tasks, and so lacked the real-time experience that would be helpful in the future as an Informatica developer. I also observed that the knowledge on the ETL workflow is crucial for the ETL Informatica developer because it helps them understand how to go about in making the process effective and efficient (M. Nancy, Personal Communication, March 12, 2016).
The iteration went on well, and most of the activities that took place were impressive. I gained much knowledge and experience from the same and came to appreciate the need for having Informatica developers in organizations. I also came to appreciate my career as I observed that much would be required of me while I would be executing my duties in any organization in which I would be working as an ETL developer in the future. During this iteration, the supervisors availed themselves throughout the iteration due to the criticality of the data with which we were dealing. They explained to me all the steps that they leveraged especially in those tasks that I did not active involvement. They also gave me the chance of asking any queries that I had, particularly the thing that I could not easily comprehend and any other concerns that I had from the previous iterations.
The iteration had me involved in most of the basic tasks that did not entail the critical functions because of the sensitivity of the data with which we were dealing. I felt that I did not have the chance of learning many essential activities due to the lack of the involvement on those tasks. I would suggest that the company involve the intern in such critical activities because that would create an avenue for gaining vital experience that would make them effective when they would be handling their daily tasks in the future. The company also had the training program for only the system administrator and the ETL Informatica developer and I can suggest that they involve other staffs such as the network administrator, otherwise the security and management of the data are the responsibility of everybody.
Iteration 5: Deployment
Deployment is the last iteration. During this period, the final project is delivered to the end user. In the last days of the iteration, I would get In Touch with my director to get feedback on my performance during the weeks that I have been in the organization. Once I knew my performance rating from the team manager, I would work out on my weaknesses to improve my knowledge and understanding (M. Nancy, Personal Communication, March 16, 2016). I would deploy through performing the different activities that I have learned in the organization.
I planned to meet the team manager and my supervisors in the Orange Hall so that they would evaluate my activities in the organization and that would be helpful in knowing the areas of my weaknesses and my strengths. The meeting would take place for three days from March 16, 2016, to March 18, 2016, so as address all the areas of my learning experience in the company. I planned to use the complements and any criticism that I would receive forms these staffs and work on my weaknesses effectively so that I would complete my internship in the company well minus any regrets. I planned to solicit the help of the director and the internship coordinator to help me understand how I would work on my weakness so that I make sure I gain maximally from the internship in Logic Planet Consulting Services, Inc.
I then planned to request the Senior ETL Informatica developer to allow work on my weakness by having involvement in the actual ETL processes in the organization. I also planned to interview the various ETL developers in the organization so that they would help me understand how to work on my weaknesses effectively and have the required competency as an ETL Informatica developer. I also planned to carry out a search on the Web and find the commonly asked questions regarding the ETL development to see if I can respond to them effectively and note the areas that I cannot perform well. I would then request the company’s ETL Informatica developers to help me addressing those challenges and by so doing I would have in-depth experience in entirely every task.
I requested the ETL Informatica developer team manager and all my supervisors in the Orange Hall to have a meeting with him so that he helped to analyze my work that I performed in the organization during my internship. Fortunately, the team manager had been recording everything that I did including those that I did well and those that I did not do well as expected. He highlighted many of the areas that I performed well and the areas that did not perform well. The supervisors also stood one by one and highlighted the areas that they thought I did not perform well while they were working with me. After that, I then met the intern’s coordinator and the director and asked if there was any way I would improve on my weaknesses since there were remaining ten days to complete my internship in the company.
The director gave me room to revise on my skills and experience after which I then requested the Senior ETL Informatica developer to allow me to work on the real-time ETL processes so as to improve on those areas in which I did not perform well. The developer also helped me understand how to go in improving on the weaknesses by telling me what should have been done. The knowledge from that developer was substantial as I learned what to do and how to do it better and that made me have less difficulty in executing the duties. I also conducted a search on the Web and then interacted with other developers in the company so that they responded well to the concerns that I had. I made sure that I asked all questions I had and received answers as appropriate from the developer.
I observed the evaluation of one’s actions is of vital importance and that it is a must while one is doing and action research. The evaluation resulted in the both positive and negative criticisms of my tasks, and that gave me the knowledge of how skilled I was up to this point in my internship (J. Charles, Personal Communication, March 16, 2016). It gave me the knowledge of the areas that I was to improve, and I improved them accordingly. I observed that the company makes sure that the intern learns all they are supposed to learn and that they acquire all the necessary insight that they are supposed to have (M. Nancy, Personal Communication, March 19, 2016). I also observed that an intern is an indispensable person in any company because they rid off the employees of their tasks at a zero cost. I also observed that the company valued the interns so much and for that reason it made sure that they gained the hands-on experience that would help them execute their duties effectively in the future.
The evaluation made me realize that there were a few areas on which I needed to improve, and so I made the necessary step towards the improvement of those areas. I observed that the other developers also had knowledge and experience in ETL processes that would not be ignored, and I wondered why they did not have inclusion in many of the company’s critical tasks including training of the interns. I observed that these developers were always to be challenged by the interns and for that reason they responded to my queries appropriately (A. Hannah, Personal Communication, March 30, 2016).. I observed that after gaining experience, the data stewards did not supervise me closely because they knew very well that I would less likely repeat the mistakes that I had done earlier.
The last iteration in Logic Planet Consulting Services, Inc gave me a good experience working on ETL processes and workflows. The last iteration on deployment is the one that introduced me to the practical experience that I had anticipated having in my internship since I had involvement in the actual tasks without supervision. The lack of supervision gave me the experience of working under minimums supervision but also ensuring I deliver quality results. I also gained the knowledge of time management and project management. The iteration on deployment also gave me a chance of interacting with other company ETL developers and gained some more knowledge that I did not have before. The developers were every willing and eager to address all the concerns that I had, and that gave me the desire to continue learning more the task of ETL Informatica development.
Despite the fact that the Senior ETL Informatica developer allowed me to conduct the actual tasks on the company’s database and servers, the system administrator did not allow me to execute some other tasks that were very basic. I suggest that the data stewards should work in unison as that can allow the interns to have the knowledge and experience that they anticipate having. I also felt that the company should have allowed the intern to carry out the real-time tasks in the previous iterations as was done during this iteration. The company also only evaluated the actions of the intern in the last iteration. I suggest that the company have a plan for assessing the activities of the interns in each and every iteration before they can proceed with the next iteration.
The internship in the Logic Planet Consulting Services, Inc was a captivating one because of the many things that I had to learn in the process and the experience that I gained. The internship began with the orientation into the company and there I came to understand the principles, standards, policies, procedures, rules and regulations by which I needed to abide while executing my duties in the company. These helped me to understand how to work so as to accomplish my internship in the company with much ease. The second iteration also introduced me to the analysis of the different ETL tools and the especially the components of the ETL tool. I learned how to leverage those components and accomplish the ETL process effectively with minimum supervision. I also had the experience of analyzing and reviewing the business requirement documents, and that helped me make a good plan that saw the design phase successful.
In the third iteration that entailed design, I learned the process of creating the ETL application workflows and reduced errors day by day while working. I gained the problem-solving skills in the process. In the fourth iteration, I also learned how to eliminate any errors I the data and make sure that different source of data is consistent. I also had a review of the ETL workflow and learned how to optimize the workflow and data management. The fifth iteration on deployment helped in the assessment of my activities sin the company, and it helped me to learn the things that I did not perform well. I, in turn, got involved in working on the actual activities sin the organization and completed my interaction after meeting all the objectives for my internship in the company.
Top of Form
Bottom of Form
Top of Form
Bottom of Form
ADBIS (Conference), In Morzy, T., In Valduriez, P., In Bellatreche, L., International Workshop on Big Data Applications and Principles, DCSA (Workshop), International Workshop on GPUs in Databases, & WISARD (Workshop). (2015). New trends in databases and information systems: ADBIS 2015 short papers and workshops, BigDap, DCSA, GID, MEBIS, OAIS, SW4CH, WISARD, Poitiers, France, September 8-11, 2015. Proceedings.
Albrecht, A., & Naumann, F. (2008). Managing ETL Processes. NTII, 8, 12-15.
Ankorion, I. (2005). Change Data Capture Efficient ETL for Real-Time BI. Information Management, 15(1), 36.
Wang, J. (2007). Data warehousing and mining: Concepts, methodologies, tools, and applications. Hershey: Igi Online.
Sherry Roberts is the author of this paper. A senior editor at MeldaResearch.Com in research paper services if you need a similar paper you can place your order for professional research proposal writing services.
"Janet Peter is the Managing Director of a globally competitive essay writing company.