smfaizalkhan/Sortable

The Input consists of two files with JSON lines format
1) Product containing product_name,manufacturer,model,announced-date
    Sample {"product_name":"Samsung_TL100","manufacturer":"Samsung","model":"TL100","announced-date":"2009-01-07T19:00:00.000-05:00"}
2)Listing containing title,manufacturer,currency,price
   Sample {"title":"Samsung TL100 12.2 Megapixel Digital Camera - Black","manufacturer":"Samsung","currency":"USD","price":"50.00"}

 On Analysing we can note that product_name key of products file has entry in listings file with "_" replaced by space in the title key.

  So on the first note we can read the each line of prodcuts file and check with listings file  like 

	1)Read the product_name from products file 
	2)Replace _ from product_name with space 
	3)find the replaced word occurence in listings file title key
	4)If title contains the replaced word ,fetch that line and add in JSON object.

      Doing so is time consuming since products file conatins 743 records (lines) and listing s file contains 20196 records (lines)

  So to do the word search,for each operations ,we have to do 700 * 20196 operations.
  
Instead if we make a datastructure of listings like map of the listings file containing title value,for every read from product fiel we can fetch the value map passing in the key.

But the issue is Map dosent allows duplicate key,but there is goggle api guava which allows duplicate key called multimap.

Process involved 

  1)Make Datastructure of listings file (mapListings())
 
        a) Read the first word of the title and map the title value to it 
        b) Read the  title and map the rest of the line to it.
       
           while ((lines = br.readLine()) != null) {
				JSONObject jsonObj = (JSONObject) jsonParser.parse(lines);
				title = jsonObj.get("title").toString();
				titleFirstWord = title.substring(0, (title.contains(" ") ? title.indexOf(" ") : title.length()));
				keyWord = titleFirstWord.replace(titleFirstWord.charAt(0),
						Character.toUpperCase(titleFirstWord.charAt(0)));
				titleKeyMap.put(keyWord, "@" + title);
				titleValueMap.put(title, lines);
			}

	So now titleKeyMap contains name like Sony,Casio,Samsung and each one has multiple values
	and titleValueMap conatins title key and rest of the lines as value.

   2) Read Products file (readProductsFromFile())

       a) Read each line of the file products.txt
       b) For each line check the occurence in the mapped datastructure (titleKeyMap and titleValueMap)
       
           try {
			BufferedReader br = new BufferedReader(new FileReader("./products.txt"));
			while ((lines = br.readLine()) != null) {
				JSONObject jsonObj = (JSONObject) jsonParser.parse(lines);
				productName = (jsonObj.get("product_name").toString());
				searchProductName = productName.substring(0,
						(productName.contains("_") ? productName.indexOf("_") : productName.indexOf("-")));
				replacedProductName = productName.replace((productName.contains("_") ? "_" : "-"), " ");
				titleValue = new StringTokenizer(titleKeyMap.get(searchProductName).toString(), "@");
				this.printValues(titleValue, productName, replacedProductName, titleValueMap, ps);

			}

    3) Checking the occurence from the passed values  readProductsFromFile()

	       public void printValues(StringTokenizer listingsValue, String productName, String replacedProductName,
	 			Multimap<String, String> titleValueMap, PrintWriter ps)

       
             a)listingsValue contains the title of the listings.
             b)Pass this listings value to the titleValueMap as key so we get the lines
             c)Add productName into JSON object as product_name and sorted values as JSON Array.
	     d)As the output requires the format in {
							"product_name": String
							"listings": Array[Listing]
						    }

            e) And JSON dosen't guarantee the order,we will add into Map and pass that map odject to JSON

		JSONArray listings = new JSONArray();			
		Map obj = new LinkedHashMap();
		obj.put("product_name", productName);
		while (listingsValue.hasMoreElements()) {
			tokenVal = listingsValue.nextElement().toString();
			tokenValue = tokenVal.substring(0, ((tokenVal.length() > 2) ? (tokenVal.length() - 2) : tokenVal.length()));
			if (tokenVal.contains(replacedProductName)) {

				arrayValue = titleValueMap.get(tokenValue).toString();
				arrayValue = arrayValue.replaceAll((arrayValue.contains("[") ? "\\[" : "\\]"), "");
				System.out.println("Check"+listings.contains(arrayValue.toString().replaceAll("\"", "")));
				if(!listings.contains(arrayValue.toString().replaceAll("\"", "")))
				listings.add(arrayValue.toString().replaceAll("\"", ""));

			}

		}

		obj.put("listings", listings);
		ps.println(JSONObject.toJSONString(obj));

So we get the desired output as 


Sample:

{"product_name":"Sony_Cyber-shot_DSC-W310","listings":["{title:Sony Cyber-shot DSC-W310 - Digital camera - compact - 12.1 Mpix - optical zoom: 4 x - supported memory: MS Duo, SD, MS PRO Duo, MS PRO Duo Mark2, SDHC, MS PRO-HG Duo - silver,manufacturer:Sony,currency:USD,price:108.75}, {title:Sony Cyber-shot DSC-W310 - Digital camera - compact - 12.1 Mpix - optical zoom: 4 x - supported memory: MS Duo, SD, MS PRO Duo, MS PRO Duo Mark2, SDHC, MS PRO-HG Duo - silver,manufacturer:Sony,currency:USD,price:113.55}]"]}
smfaizalkhan / Sortable

About

Languages