Mansoor1565

The-Hull-White-model

Introduction The Hull-White model is financial modeling in Python. It is an ideal of future interest rates in financial mathematics. It is right to the class of no-arbitrage models. Those are capable of appropriate to the latest term structure of interest rates in its best generic development. The Hull-White model is comparatively direct to translate the mathematical description of the progress of future interest rates onto a tree or frame. Therefore, the interest rate derivatives for example Bermudan swaptions may be valued in the model. The first Hull-White model was labeled by John C. Hull and Alan White in 1990. That is quite widespread in the market nowadays. In this article, we will understand the Hull-White model and also do Simulations with QuantLib Python. Description We can define the Hull-White Short Rate Model as: Hull-White Short Rate Model There is a degree of uncertainty among practitioners about exactly that parameters in the model are time-dependent. Similarly, what name to spread over to the model in each case? The most usually known naming convention is the following: Thetta has t (time) dependence that is the Hull-White model. Thetta And Alpha are both time-dependent — the long Vasicek model. We use QuantLib to display how to simulate the Hull-White model and examine some of the properties. We import the libraries and set things up as described below: import QuantLib as ql import matplotlib.pyplot as plt import numpy as np % matplotlib inline We use the constant for this instance is all well-defined as described below. Variables sigma and are the constants that define the Hull-White model. We discretize the time span of length thirty years into 360 intervals. This is defined by the timestep variable in the simulation. We would use a constant forward rate term structure as an input for ease. It is the right way to swap with another term structure here. sigma = 0.1 a = 0.1 timestep = 360 length = 30 # in years forward_rate = 0.05 day_count = ql.Thirty360() todays_date = ql.Date(15, 1, 2015) ql.Settings.instance().evaluationDate = todays_date spot_curve = ql.FlatForward(todays_date, ql.QuoteHandle(ql.SimpleQuote(forward_rate)), day_count) spot_curve_handle = ql.YieldTermStructureHandle(spot_curve) hw_process = ql.HullWhiteProcess(spot_curve_handle, a, sigma) rng = ql.GaussianRandomSequenceGenerator(ql.UniformRandomSequenceGenerator(timestep, ql.UniformRandomGenerator())) seq = ql.GaussianPathGenerator(hw_process, length, timestep, rng, False) The Hull-White process is built bypassing the term structure, a and sigma. One has to make available a random sequence generator along with other simulation inputs for example timestep and `length to create the path generator. A function to make paths may be written as demonstrated below: def generate_paths(num_paths, timestep): arr = np.zeros((num_paths, timestep+1)) for i in range(num_paths): sample_path = seq.next() path = sample_path.value() time = [path.time(j) for j in range(len(path))] value = [path[j] for j in range(len(path))] arr[i, :] = np.array(value) return np.array(time), arr The simulation of the short rates appearance is as follows: num_paths = 10 time, paths = generate_paths(num_paths, timestep) for i in range(num_paths): plt.plot(time, paths[i, :], lw=0.8, alpha=0.6) plt.title("Hull-White Short Rate Simulation") plt.show() The Hull–White model Monte-Carlo simulation On the other hand, valuing vanilla instruments for example caps and swaptions is valuable mainly for calibration. The actual use of the model is to value rather more exotic derivatives for example Bermudan swaptions on a lattice. Also, other derivatives in a multi-currency context for example Quanto Constant Maturity Swaps. These are explained for instance in Brigo and Mercurio (2001). The well-organized and precise Monte-Carlo simulation of the Hull-White model with time-dependent parameters may be easily performed. For more details visit:https://www.technologiesinindustry4.com/2022/01/the-hull-white-model.html

000

Archimedes-Discovery-of-Pi

Introduction Greek Mathematician Archimedes’s discovery of Pi is the largest fraction known to humans. The number π (spelled as “pi”) is a mathematical constant – approximately equal to 3.14159. It is defined as the ratio of a circle’s circumference to its diameter & is also called Archimedes constant. The π cannot be expressed as a common fraction & its decimal representation has not been found by supercomputers & goes on into trillions of numbers after 3.14 & never ends. Ancient civilizations including Egyptians & Babylonians required fairly accurate approximations of π for practical computations. In 250 BC Greek mathematician, Archimedes created an algorithm to approximate π with reasonable accuracy to 7 digits. In this article, we will discuss the history and importance of Pi. Also, describe the Deep learning prediction of the digits of Pi. Archimedes Discovery of Pi Description Archimedes (287-212 BCE) was a pioneer in the fields of mathematician and mechanical engineering. He is greatest recognized for formulating Archimedes’ Principle. That is well-known as the law of buoyancy. Similarly, he observed a lot of other laws of physics. He logged his observations as mathematical theorems. Archimedes reaches the logical decision that the ratio of a circle’s circumference to its diameter is greater than 3 1/7 but less than 3 10/71. This is a very good estimate in his work on the measurement of the Circle. That is known as the mathematical constant we today call “pi” (π). History of Pi (π) Pi (π) has been recognized for almost 4000 years. The prehistoric Babylonians calculated the area of a circle by taking 3 times the square of its radius that presented a value of pi = 3. One Babylonian tablet shows a value of 3.125 for π that is a nearer approximation. The Egyptians considered the area of a circle by a formula that provided the estimated value of 3.1605 for π. The first calculation of π was completed by Archimedes of Syracuse. Archimedes approached the area of a circle by using the Pythagorean Theorem. That was used to discover the areas of two regular polygons. The polygon carved inside the circle and the polygon in which the circle was circumscribed. In the meantime, the definite area of the circle lies between the areas of the inscribed and circumscribed polygons. The areas of the polygons provided upper and lower bounds to the area of the circle. Archimedes distinguished that he had not found the value of π then only an estimate inside those limits. Archimedes presented that π is between 3 1/7 and 3 10/71 like this. Importance of Pi The number Pi or π is one of the most significant numbers in our universe. Discover humankind’s journey—efforts all over the ages that really transcend cultures—to calculate, imprecise, and realize this mysterious number. It is an essential concept that seems in all features of mathematics. Pi is a vital concept that benefits from knowing the universal truths and some mathematical concepts. Generally, pi values are used in concepts such as trigonometry, geometry, and advanced concepts like probability, statistics, and complex numbers. Pi is a well-identified mathematical constant used all over the world. In early times it is used as a top-secret code for certain confidential works. Later, a circle containing 3600 and π is used to denote the 3600 of a circle. It is mentioned as a circular constant. Deep learning prediction of the digits of Pi An adequately big neural network can easily mimic one of the several algorithms for producing the digits of π. It’ll require to be big enough to put up the necessary logic and memory for variables. Though, there’s nothing essentially unpredictable about those digits. They are without a glitch computable, with really pretty short computer programs. There’s no motive to attempt this, actually. There’s no aim to imagine any deep revelation from a neural network that learned to compute the digits of π. It can be exciting from the viewpoint of studying the easy-to-read power of Neural Networks, and the quantity of information required to train for a short algorithm. Neural Networks aren’t a fit algorithmic paradigm for this kind of computation. However, there’s no object to suppose this to be aware or valuable from a mathematical perspective. For more details visit:https://www.technologiesinindustry4.com

000

AI-Research-SuperCluster

Introduction Meta is building a gigantic supercomputer called AI Research SuperCluster to train artificial intelligence models. Meta claims that it would be the fastest Super Computer in the world when it’s completely prepared in mid-2022. Meta said it will have 16,000 GPU’s (graphical processors) – and will be capable of five exaflops of computing performance — that’s 5 quintillion operations per second. Meta said its researchers have begun using the supercomputer to train large models related to natural-language processing & computer vision – and researchers will be able to use the supercomputer to “seamlessly analyze text, images & video together” and come up with new augmented reality tools. In this post, we will present a brief description of a massive supercomputer named AI Research SuperCluster. Description The AI Research SuperCluster is 20x faster than Meta’s V100-based production clusters. Similarly, it is 9x faster than Meta’s V100-based research clusters. It is well-found with trillions of parameters for natural language processing (NLP) and training models. It may effortlessly study images, texts, and videos through hundreds of different languages. According to the Meta, its researchers have now started using RSC to train large models in natural language processing (NLP). They have trained big models of computer vision for research. Their main aim is one-day training models with trillions of parameters. The company thinks that RSC will help its AI researchers build new and improved AI models. Those may learn from trillions of instances. They can also work across hundreds of different languages, effortlessly examine text, images, and video together. Similarly, they can develop new augmented reality tools and much more. AI Research SuperCluster Advanced AI and Machine Learning Computations The researchers say that they will be able to train the largest models required to develop forward-thinking AI for computer vision, NLP, speech recognition, and more. The Research SuperCluster will support us to build completely new AI systems. For example: Power real-time voice translations to large groups of people Each speaks a different language. Therefore, they may flawlessly work together on a research project. Play an AR game together. Eventually, the work completed with RSC will head toward building technologies for the next main computing platform 0 the metaverse. Where AI-driven applications and products will play a key role. Floating-point arithmetic Floating-point arithmetic We need to understand that together supercomputers and AI supercomputers make calculations using what is identified as floating-point arithmetic. That is a mathematical shorthand. It is very valuable for making calculations using very big and very small numbers. The floating-point in question is the decimal point that floats between important figures. The degree of exactness deployed in floating-point calculations may be used based on different formats. The speed of the greatest supercomputers is designed using what are known as 64-bit floating-point operations per second. On the other hand, as AI calculations need less accuracy, AI supercomputers are frequently measured in 32-bit or even 16-bit FLOPs. That’s why relating the two types of systems is not certainly apples to apples. However, this caution doesn’t diminish the incredible power and size of AI supercomputers. For more details visit:https://www.technologiesinindustry4.com

000

MapReduce-with-Python

Introduction MapReduce with Python is a programming model. It allows big volumes of data to be processed and created by dividing work into independent tasks. It further enables performing the tasks in parallel across a cluster of machines. The MapReduce programming style was stirred by the functional programming constructs map and reduce. Those are usually used to process lists of data. Each Map‐Reduce program converts a list of input data elements into a list of output data elements twice at a high level. That is transformed once in the map phase and once in the reduce phase. In this article, we will introduce the MapReduce programming model. Also, describe how data flows via the different stages of the model. Description The MapReduce structure is composed of three major phases. Map Shuffle and sort Reduce Map The first stage of a MapReduce application is the map stage. A function that is called the mapper, routes a series of key-value pairs inside the map stage. The mapper serially processes every key-value pair separately, creating zero or more output key-value pairs. See the below figure in which the mapper is applied to every input key-value pair, creating an output key-value pair Map Consider a mapper whose drive is to change sentences into words for instance. The input to this mapper will be strings that comprise sentences. The mapper’s function will be to divide the sentences into words and output the words. See the below figure in which the input of the mapper is a string, and the function of the mapper is to divide the input into spaces. The subsequent output is the individual words from the mapper’s input. Map Shuffle and Sort The second stage of MapReduce is the shuffle and sort. The intermediate outputs from the map stage are moved to the reducers as the mappers bring into being completing. This process of moving output from the mappers to the reducers is recognized as shuffling. Shuffling is moved by a divider function, named the partitioner. The partitioner is used to handle the flow of key-value pairs from mappers to reducers. The partitioner is provided the mapper’s output key and the number of reducers. It returns the index of the planned reducer. The partitioner makes sure that all of the values for the same key are sent to the matching reducer. The default partitioner is identified as hash-based. It calculates a hash value of the mapper’s output key and allocates a partition based on this result. The last stage before the reducers begin processing data is the sorting process. The intermediate keys and values for every partition are organized by the Hadoop framework before being offered to the reducer. Reduce The third stage of MapReduce is the reduce stage. An iterator of values is given to a function known as the reducer inside the reducer phase. The iterator of values is a non-single set of values for every unique key from the output of the map stage. The reducer sums the values for every single key and yields zero or more output key-value pairs. See the below figure in which the reducer iterates over the input values, creating an output key-value pair. Reduce Consider a reducer whose drive is to sum all of the values for a key as an instance. The input to this reducer is an iterator of altogether of the values for a key. The reducer calculates all of the values. The reducer then outputs a key-value pair that comprises the input key and the sum of the input key values. See the below figure in which the reducer sums the values for the keys “cat” and “mouse”. Reduce A Python Example The WordCount application may be applied to prove how the Hadoop streaming utility may run Python as a MapReduce application on a Hadoop cluster. Following are two Python programs. py and reducer.py. py is the Python program that applies the logic in the map stage of WordCount. It states data from stdin, splits the lines into words. It outputs every word with its intermediate count to stdout. The code in the below example implements the logic in mapper.py. #!/usr/bin/env python import sys # Read each line from stdin for line in sys.stdin: # Get the words in each line words = line.split() # Generate the count for each word for word in words: # Write the key-value pair to stdout to be processed by # the reducer. # The key is anything before the first tab character and the #value is anything after the first tab character. print '{0}\t{1}'.format(word, 1) py is the Python program that applies the logic in the reduce stage of WordCount. It delivers the results of mapper.py from stdin, sums the rates of each word, and writes the result to stdout. The code in the below example applies the logic in reducer.py. #!/usr/bin/env python import sys curr_word = None curr_count = 0 # Process each key-value pair from the mapper for line in sys.stdin: # Get the key and value from the current line word, count = line.split('\t') # Convert the count to an int count = int(count) # If the current word is the same as the previous word, # increment its count, otherwise print the words count # to stdout if word == curr_word: curr_count += count else: # Write word and its number of occurrences as a key-value # pair to stdout if curr_word: print '{0}\t{1}'.format(curr_word, curr_count) curr_word = word curr_count = count # Output the count for the last word if curr_word == word: print '{0}\t{1}'.format(curr_word, curr_count) When the mapper and reducer programs are performing well against tests, they may be run as a MapReduce application using the Hadoop streaming utility. Below is the command to run the Python programs mapper.py and reducer.py on a Hadoop cluster. $ $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/mapred/contrib/streaming/hadoop-streaming*.jar \ -files mapper.py,reducer.py \ -mapper mapper.py \ -reducer reducer.py \ -input /user/hduser/input.txt -output /user/hduser/output For more details visit: https://www.technologiesinindustry4.com

000

Elements-of-a-Metavers

Introduction Metaverse is the next evolution of digital technologies. It includes 3D virtualization and will transform digital technologies in the next 5-10 years. Elements of a Metaverse are considered very important related to industry 4.0. Metaverse will have numerous technologies comprising the below: Digital currency Online shopping Workplace automation Social media Digital Humans Natural Language Processing Infrastructure Device independence In this article, we would understand what Metaverse is and what are its different elements? Description Metaverse is a combined virtual space. It is made by the convergence of virtually improved physical and digital reality. We can also say that it is device-free and is not owned by a single seller. Metaverse is known as an independent virtual economy. It is allowed by digital currencies and non-fungible tokens (NFTs). It characterizes a combinatorial innovation because it needs many technologies and trends to function. The following are contributing tech capabilities: Augmented reality (AR) Flexible work styles Head-mounted displays (HMDs) An AR cloud The Internet of Things (IoT) 5G Artificial intelligence (AI) Spatial technologies For better understanding, the concepts of a Metaverse, consider it as the next version of the Internet. That begins as separate bulletin boards and independent online destinations. In the long run, these destinations developed sites on a virtual shared space same as how a Metaverse would develop. Importance of the Metaverse There is a lot of exhilaration around Metaverse. Greatly of it driven by technology companies tactically claiming to be Metaverse companies. Similarly, Metaverse creates to improve or augment the digital and physical realities of people. Furthermore, deeds that now happen in siloed locations will finally occur in a single Metaverse, for example: Buying clothes and fittings for online avatars Business digital land and building virtual homes Take part in a virtual social experience Supermarkets run in virtual malls through immersive commerce With virtual classrooms to practice immersive learning Purchasing digital art, breakables, and assets (NFTs) Networking with digital humans for onboarding business connections It is likely that a Metaverse will make available determined, decentralized, joint, and interoperable opportunities. It will create the business models, which will allow organizations to extend digital business. Elements of a Metaverse Gartner is a technology research and consulting company. It described the elements of a Metaverse in the below diagram. Elements of a Metaverse Applications Virtual reality The social network company Facebook launched a social VR world named Facebook Horizon in 2019. Facebook chairman Mark Zuckerberg confirmed in 2021, a company pledged to develop a metaverse. Several VR technologies promoted by Meta Platforms remain to be developed. Microsoft developed the VR Company AltspaceVR in 2017. Microsoft has since applied metaverse features for example virtual avatars and meetings thought in virtual reality into Microsoft Teams. Future implementations for metaverse technology comprise refining work output, shared learning environments, e-commerce, real estate, and fashion. Video games Many works of metaverse technologies have by now been developed inside modern internet-enabled video games. The Second Life is combined several features of social media into a determined three-dimensional world with the user signified as an avatar. Social functions are repeatedly an integral story in many hugely multiplayer online games. Social-based gameplay of Minecraft characterizes an innovative analog of a metaverse. Hardware Technology Entrance points for metaverse comprise general-purpose computers and smartphones. Also, included augmented reality (AR), mixed reality, virtual reality (VR), and virtual world technologies. Need on VR technology has limited metaverse growth and wide-scale acceptance. Limits of moveable hardware and the requirement to balance cost and design have produced a deficiency of high-quality graphics and mobility. Lightweight wireless headsets have fought to attain the retina display pixel density required for visual immersion. Present hardware development is dedicated to choking limitations of VR headsets, sensors, and growing immersion with haptic technology. Software Technology There has been nothing for wide-scale acceptance of a uniform technical requirement for metaverse applications. Current applications depend chiefly on proprietary technology. Interoperability is the main anxiety in metaverse development. There have been a number of virtual environment standardization projects. Metaverse is known as a three-dimensional Internet that is occupied with live people. The technology company NVIDIA declared in 2021 they will accept USD for their metaverse development tools. The OpenXR is an open standard for entree into virtual and augmented reality plans and involvements. It has been accepted by Microsoft for HoloLens 2, Meta Platforms for the Oculus Quest, and Valve for SteamVR. For more details visit: https://www.technologiesinindustry4.com

200

Elements-of-a-Metaverse

Introduction Metaverse is the next evolution of digital technologies. It includes 3D virtualization and will transform digital technologies in the next 5-10 years. Elements of a Metaverse are considered very important related to industry 4.0. Metaverse will have numerous technologies comprising the below: Digital currency Online shopping Workplace automation Social media Digital Humans Natural Language Processing Infrastructure Device independence In this article, we would understand what Metaverse is and what are its different elements? Description Metaverse is a combined virtual space. It is made by the convergence of virtually improved physical and digital reality. We can also say that it is device-free and is not owned by a single seller. Metaverse is known as an independent virtual economy. It is allowed by digital currencies and non-fungible tokens (NFTs). It characterizes a combinatorial innovation because it needs many technologies and trends to function. The following are contributing tech capabilities: Augmented reality (AR) Flexible work styles Head-mounted displays (HMDs) An AR cloud The Internet of Things (IoT) 5G Artificial intelligence (AI) Spatial technologies For better understanding, the concepts of a Metaverse, consider it as the next version of the Internet. That begins as separate bulletin boards and independent online destinations. In the long run, these destinations developed sites on a virtual shared space same as how a Metaverse would develop. Importance of the Metaverse There is a lot of exhilaration around Metaverse. Greatly of it driven by technology companies tactically claiming to be Metaverse companies. Similarly, Metaverse creates to improve or augment the digital and physical realities of people. Furthermore, deeds that now happen in siloed locations will finally occur in a single Metaverse, for example: Buying clothes and fittings for online avatars Business digital land and building virtual homes Take part in a virtual social experience Supermarkets run in virtual malls through immersive commerce With virtual classrooms to practice immersive learning Purchasing digital art, breakables, and assets (NFTs) Networking with digital humans for onboarding business connections It is likely that a Metaverse will make available determined, decentralized, joint, and interoperable opportunities. It will create the business models, which will allow organizations to extend digital business. Elements of a Metaverse Gartner is a technology research and consulting company. It described the elements of a Metaverse in the below diagram. Elements of a Metaverse Applications Virtual reality The social network company Facebook launched a social VR world named Facebook Horizon in 2019. Facebook chairman Mark Zuckerberg confirmed in 2021, a company pledged to develop a metaverse. Several VR technologies promoted by Meta Platforms remain to be developed. Microsoft developed the VR Company AltspaceVR in 2017. Microsoft has since applied metaverse features for example virtual avatars and meetings thought in virtual reality into Microsoft Teams. Future implementations for metaverse technology comprise refining work output, shared learning environments, e-commerce, real estate, and fashion. Video games Many works of metaverse technologies have by now been developed inside modern internet-enabled video games. The Second Life is combined several features of social media into a determined three-dimensional world with the user signified as an avatar. Social functions are repeatedly an integral story in many hugely multiplayer online games. Social-based gameplay of Minecraft characterizes an innovative analog of a metaverse. Hardware Technology Entrance points for metaverse comprise general-purpose computers and smartphones. Also, included augmented reality (AR), mixed reality, virtual reality (VR), and virtual world technologies. Need on VR technology has limited metaverse growth and wide-scale acceptance. Limits of moveable hardware and the requirement to balance cost and design have produced a deficiency of high-quality graphics and mobility. Lightweight wireless headsets have fought to attain the retina display pixel density required for visual immersion. Present hardware development is dedicated to choking limitations of VR headsets, sensors, and growing immersion with haptic technology. Software Technology There has been nothing for wide-scale acceptance of a uniform technical requirement for metaverse applications. Current applications depend chiefly on proprietary technology. Interoperability is the main anxiety in metaverse development. There have been a number of virtual environment standardization projects. Metaverse is known as a three-dimensional Internet that is occupied with live people. The technology company NVIDIA declared in 2021 they will accept USD for their metaverse development tools. The OpenXR is an open standard for entree into virtual and augmented reality plans and involvements. It has been accepted by Microsoft for HoloLens 2, Meta Platforms for the Oculus Quest, and Valve for SteamVR.

200

Pig-and-Python

Introduction Pig and Python are very widespread systems for executing complex Hadoop map-reduce-based data-flows. It enhances a layer of abstraction on top of Hadoop’s map-reduce mechanisms. That is with the intention of permitting developers to take a high-level view of the data and operations on that data. Pig enables us to do things more openly. For instance, we may join two or more data sources. Writing a join as a map and reduce function is a bit of a drag and it’s commonly value avoiding. Therefore, Pig is great as it makes simpler multifaceted tasks. It offers a high-level scripting language that permits users to take more of a big-picture view of their data flow. Pig is particularly inordinate as it is extensible. This article will emphasize its extensibility. At the end of this article, we will be able to write PigLatin scripts that execute Python code as a part of a larger map-reduce workflow. Description Pig is composed of two main parts: A high-level data-flow language is called Pig Latin. An engine that analyses improves, and performs the Pig Latin scripts as a series of MapReduce jobs that are run on a Hadoop cluster. Pig is at ease to write, comprehend, and maintain as it is a data transformation language that enables the processing of data to be described as a sequence of transformations. It is similarly highly extensible through the use of the User Defined Functions (UDFs). User-Defined Functions (UDFs) A Pig UDF permits custom processing to be written in many languages, for example, Python. It is a function that is nearby to Pig. On the other hand, it is written in a language that isn’t PigLatin. Pig permits us to register UDFs for use within a PigLatin script. A UDF requires to fit a precise prototype An instance of a Pig application is the Extract, Transform, Load (ETL) process. That defines how an application extracts data from a data source, changes the data for querying and examination drives. It also loads the result onto a target data store. When Pig loads the data, it may execute projections, iterations, and other transformations. UDFs allow more multifaceted algorithms to be useful during the change phase. It may be stored back in HDFS after the data is done being processed by Pig. PigLatin scripts We can write the simplest Python UDF as; from pig_util import outputSchema @outputSchema('word:chararray') def hi_world(): return "hello world" The data output from a function has a particular form. Pig likes it if we require the schema of the data as then it distinguishes what it may do with that data. That’s what the output_schema decorator is for. There are a couple of diverse means to state a schema. If that were saved in a file named my_udfs.py we will be able to make use of it in a PigLatin script as; -- first register it to make it available REGISTER 'myudf.py' using jython as my_special_udfs users = LOAD 'user_data' AS (name: chararray); hello_users = FOREACH users GENERATE name, my_special_udfs.hi_world(); UDF arguments UDF has inputs and outputs as well. Look at the below some UDFs: def deal_with_a_string(s1): return s1 + " for the win!" def deal_with_two_strings(s1,s2): return s1 + " " + s2 def square_a_number(i): return i*i def now_for_a_bag(lBag): lOut = [] for i,l in enumerate(lBag): lNew = [i,] + l lOut.append(lNew) return lOut The following are UDFs in a PigLatin script: REGISTER 'myudf.py' using jython as myudfs users = LOAD 'user_data' AS (firstname: chararray, lastname:chararray,some_integer:int); winning_users = FOREACH users GENERATE myudfs.deal_with_a_string(firstname); full_names = FOREACH users GENERATE myudfs.deal_with_two_strings(firstname,lastname); squared_integers = FOREACH users GENERATE myudfs.square_a_number(some_integer); users_by_number = GROUP users by some_integer; indexed_users_by_number = FOREACH users_by_number GENERATE group,myudfs.now_for_a_bag(users); Outside Standard Python UDFs We can’t use NumPy from Jython. Moreover, Pig doesn’t actually permit Python Filter UDFs. We may only do stuff as: user_messages = LOAD 'user_twits' AS (name:chararray, message:chararray); --add a field that says whether it is naughty (1) or not (0) messages_with_rudeness = FOREACH user_messages GENERATE name,message,contains_naughty_words(message) as naughty; --then filter by the naughty field filtered_messages = FILTER messages_with_rudeness by (naughty==1); -- and finally strip away the naughty field rude_messages = FOREACH filtered_messages GENERATE name,message; Python Streaming UDFs Pig enables us to look into the Hadoop Streaming API. This allows us to get around the Jython issue when we require it to. Hadoop lets us write mappers and reducers in any language that provides us access to stdin and stdout. Therefore, that’s attractive much any language we want. Similar to Python 3 or even Cow. The following is a simple Python streaming script, let’s call it simple_stream.py: #! /usr/bin/env python import sys import string for line in sys.stdin: if len(line) == 0: continue l = line.split() #split the line by whitespace for i,s in enumerate(l): print "{key}\t{value}\n".format(key=i,value=s) # give out a key value pair for each word in the line The purpose is to develop Hadoop to run the script on each node. The hashbang line (#!) requires to be valid on every node. Each import statement must be valid on every node. Also, any system-level files or resources accessed inside the Python script must be accessible in the same way on every node. Use with simple_stream script DEFINE stream_alias 'simple_stream.py' SHIP('simple_stream.py'); user_messages = LOAD 'user_twits' AS (name:chararray, message:chararray); just_messages = FOREACH user_messages generate message; streamed = STREAM just_messages THROUGH stream_alias; DUMP streamed; The over-all format we are using is: DEFINE alias 'command' SHIP('files'); The alias is the name used to access the streaming function from inside the PigLatin script. The command is the system command Pig would call when it is essential to use the streaming function. Finally, SHIP tells Pig those files and dependencies Pig desires to distribute to the Hadoop nodes for the command to be able to work.

100

piaic-batch4-faisalabad

Class data for students of PIAIC Batch4 Faisalabad

000

Assignment2

Python programming assignment2

000